AITopics

Country:

Europe > Switzerland > Zürich > Zürich (0.41)
North America > United States > Illinois > Cook County > Chicago (0.40)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(5 more...)

Genre: Research Report (0.67)

Industry:

Information Technology (0.93)
Energy (0.67)
Government > Regional Government (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-8-2025, 17:00:45 GMT

rich Documents with Layout Annotations from Web Crawl Data Maurice Weber ETH Zurich Carlo Siebenschuh University of Chicago Rory M. Butler

The first three authors contributed equally.

artificial intelligence, machine learning, natural language, (20 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.41)
North America > United States > Illinois > Cook County > Chicago (0.40)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(5 more...)

Genre: Research Report (0.67)

Industry:

Information Technology (0.93)
Energy (0.67)
Government > Regional Government (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Sung Ju Hwang, Leonid Sigal

A Unified Semantic Embedding: Relating Taxonomies and Attributes

Neural Information Processing SystemsOct-2-2025, 22:01:56 GMT

We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercat-egories and attributes. Contrary to prior work, which only utilized them as side information, we explicitly embed these semantic entities into the same space where we embed categories, which enables us to represent a category as their linear combination. By exploiting such a unified model for semantics, we enforce each category to be generated as a supercategory + a sparse combination of attributes, with an additional exclusive regularization to learn discriminative composition. The proposed reconstructive regularization guides the discriminative learning process to learn a model with better generalization. This model also generates compact semantic description of each category, which enhances interoperability and enables humans to analyze what has been learned.

category, regularization, semantic entity, (14 more...)

Country: Asia > South Korea > Ulsan > Ulsan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceAug-11-2025

NavComposer: Composing Language Instructions for Navigation Trajectories through Action-Scene-Object Modularization

He, Zongtao, Wang, Liuyi, Chen, Lu, Liu, Chengju, Chen, Qijun

Language-guided navigation is a cornerstone of embodied AI, enabling agents to interpret language instructions and navigate complex environments. However, expert-provided instructions are limited in quantity, while synthesized annotations often lack quality, making them insufficient for large-scale research. To address this, we propose NavComposer, a novel framework for automatically generating high-quality navigation instructions. NavComposer explicitly decomposes semantic entities such as actions, scenes, and objects, and recomposes them into natural language instructions. Its modular architecture allows flexible integration of state-of-the-art techniques, while the explicit use of semantic entities enhances both the richness and accuracy of instructions. Moreover, it operates in a data-agnostic manner, supporting adaptation to diverse navigation trajectories without domain-specific training. Complementing NavComposer, we introduce NavInstrCritic, a comprehensive annotation-free evaluation system that assesses navigation instructions on three dimensions: contrastive matching, semantic consistency, and linguistic diversity. NavInstrCritic provides a holistic evaluation of instruction quality, addressing limitations of traditional metrics that rely heavily on expert annotations. By decoupling instruction generation and evaluation from specific navigation agents, our method enables more scalable and generalizable research. Extensive experiments provide direct and practical evidence for the effectiveness of our method.

large language model, machine learning, natural language, (18 more...)

doi: 10.1109/TCSVT.2025.3596386

2507.10894

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Dasgupta, Subhasis, Stephens, Jon, Gupta, Amarnath

OLG++: A Semantic Extension of Obligation Logic Graph

arXiv.org Artificial IntelligenceJul-9-2025

We present OLG++, a semantic extension of the Obligation Logic Graph (OLG) for modeling regulatory and legal rules in municipal and interjurisdictional contexts. OLG++ introduces richer node and edge types, including spatial, temporal, party group, defeasibility, and logical grouping constructs, enabling nuanced representations of legal obligations, exceptions, and hierarchies. The model supports structured reasoning over rules with contextual conditions, precedence, and complex triggers. We demonstrate its expressiveness through examples from food business regulations, showing how OLG++ supports legal question answering using property graph queries. OLG++ also improves over LegalRuleML by providing native support for subClassOf, spatial constraints, and reified exception structures. Our examples show that OLG++ is more expressive than prior graph-based models for legal knowledge representation.

artificial intelligence, natural language, obligation, (15 more...)

2507.05488

Country: North America > United States > California > San Diego County (0.17)

Genre: Research Report (0.50)

Industry: Law > Statutes (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.47)

arXiv.org Artificial IntelligenceJun-3-2025

VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding

Ding, Yihao, Han, Soyeon Caren, Li, Yan, Poon, Josiah

Visually Rich Document Understanding (VRDU) has emerged as a critical field in document intelligence, enabling automated extraction of key information from complex documents across domains such as medical, financial, and educational applications. However, form-like documents pose unique challenges due to their complex layouts, multi-stakeholder involvement, and high structural variability. Addressing these issues, the VRD-IU Competition was introduced, focusing on extracting and localizing key information from multi-format forms within the Form-NLU dataset, which includes digital, printed, and handwritten documents. This paper presents insights from the competition, which featured two tracks: Track A, emphasizing entity-based key information retrieval, and Track B, targeting end-to-end key information localization from raw document images. With over 20 participating teams, the competition showcased various state-of-the-art methodologies, including hierarchical decomposition, transformer-based retrieval, multimodal feature fusion, and advanced object detection techniques. The top-performing models set new benchmarks in VRDU, providing valuable insights into document intelligence.

information, large language model, machine learning, (16 more...)

2506.01388

Country:

Asia > South Korea (0.16)
Europe > Switzerland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

arXiv.org Artificial IntelligenceApr-10-2025

RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration

Alama, Omar, Bhattacharya, Avigyan, He, Haoyang, Kim, Seungchan, Qiu, Yuheng, Wang, Wenshan, Ho, Cherie, Keetha, Nikhil, Scherer, Sebastian

Open-set semantic mapping is crucial for open-world robots. Current mapping approaches either are limited by the depth range or only map beyond-range entities in constrained settings, where overall they fail to combine within-range and beyond-range observations. Furthermore, these methods make a trade-off between fine-grained semantics and efficiency. We introduce RayFronts, a unified representation that enables both dense and beyond-range efficient semantic mapping. RayFronts encodes task-agnostic open-set semantics to both in-range voxels and beyond-range rays encoded at map boundaries, empowering the robot to reduce search volumes significantly and make informed decisions both within & beyond sensory range, while running at 8.84 Hz on an Orin AGX. Benchmarking the within-range semantics shows that RayFronts's fine-grained image encoding provides 1.34x zero-shot 3D semantic segmentation performance while improving throughput by 16.5x. Traditionally, online mapping performance is entangled with other system components, complicating evaluation. We propose a planner-agnostic evaluation framework that captures the utility for online beyond-range search and exploration, and show RayFronts reduces search volume 2.2x more efficiently than the closest online baselines.

frontier, machine learning, natural language, (17 more...)

2504.06994

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Sung Ju Hwang, Leonid Sigal

A Unified Semantic Embedding: Relating Taxonomies and Attributes

Neural Information Processing SystemsFeb-9-2025, 23:39:58 GMT

We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes. Contrary to prior work, which only utilized them as side information, we explicitly embed these semantic entities into the same space where we embed categories, which enables us to represent a category as their linear combination. By exploiting such a unified model for semantics, we enforce each category to be generated as a supercategory + a sparse combination of attributes, with an additional exclusive regularization to learn discriminative composition. The proposed reconstructive regularization guides the discriminative learning process to learn a model with better generalization. This model also generates compact semantic description of each category, which enhances interoperability and enables humans to analyze what has been learned.

category, machine learning, natural language, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > South Korea > Ulsan > Ulsan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceApr-19-2024

PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

Ding, Yihao, Ren, Kaixuan, Huang, Jiabin, Luo, Siwen, Han, Soyeon Caren

Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD), particularly those dominated by lengthy textual content like research journal articles. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending the hierarchical semantic relations among multiple pages to locate multimodal components. To address this gap, we propose PDF-MVQA, which is tailored for research journal articles, encompassing multiple pages and multimodal information retrieval. Unlike traditional machine reading comprehension (MRC) tasks, our approach aims to retrieve entire paragraphs containing answers or visually rich document entities like tables and figures. Our contributions include the introduction of a comprehensive PDF Document VQA dataset, allowing the examination of semantically hierarchical layout structures in text-dominant documents. We also present new VRD-QA frameworks designed to grasp textual contents and relations among document layouts simultaneously, extending page-level understanding to the entire multi-page document. Through this work, we aim to enhance the capabilities of existing vision-and-language models in handling challenges posed by text-dominant documents in VRD-QA.

document entity, information, representation, (17 more...)